logo

Steps data analysis

Introduction

The variables included in the data set are:

Field Description
AmountWeek How many cups of coffee do you typically consume weekly?
AmountOutMonth How frequently do you drink out-of-home per month on average?
MoneyCoffee How much money on average do you estimate you spend on coffee per month?
MoneyGroceries How much on average do you spend on general groceries per month?
Machine How do you brew your coffee at home?
Brand change How often do you switch between coffee brands?
Purchase location Where do you usually purchase your coffee?
Supermarket_Positive_Reasons When you purchase coffee from the supermarket what are your main reasons for doing so?
Supermarket_Negative_Reasons What would be reasons why you would not purchase coffee from the supermarket?
Criteria_Type_Coffee What are your main criteria’s or evaluation points for choosing the type of coffee?
KnowledgeCoffee How would you describe your knowledge level regarding coffee in general?
Purchase_Price I believe that the ____ is important to my decision on which coffee to purchase.
Purchase_Sustainability I believe that the ____ is important to my decision on which coffee to purchase.
Purchase_Sustainability I believe that the ____ is important to my decision on which coffee to purchase.
Purchase_Fairtrade I believe that the ____ is important to my decision on which coffee to purchase.
Purchase_Packaging I believe that the ____ is important to my decision on which coffee to purchase.
Frequency_Specialty How often do you drink specialty coffee?
Subscription_Likely How likely are you to have an online subscription for (specialty) coffee?
Subscription_Not_Likely What is the number one reasons why you would be hesitant?
App_Likely How likely are you to value and use an app for your online subscription?
Gender What is your gender?
AgeCategory What is your age category?
Occupation What is your occupational status?
Education What level of education have you completed?
Home How would you describe the place you currently live in?

Univariate descriptions - Categorical variables

Age category

Age Category Absolute Relative
< 18 1 0.50%
18-25 60 29.85%
25-45 82 40.80%
45-60 47 23.38%
> 60 11 5.47%

Home

Home Absolute Relative
Rural (Town) 22 10.95%
Suburbs 17 8.46%
Urban (City) 162 80.60%

Gender

Gender Absolute Relative
Anders 2 1.00%
Female 130 64.68%
Male 69 34.33%

Education

Education Absolute Relative
Associate degree 18 8.96%
Bachelor’s degree 112 55.72%
High school 20 9.95%
Master 47 23.38%
Phd 3 1.49%
Secondary school 1 0.50%

Machine

Machine Absolute Relative
Aeropress 1 0.50%
CupMachine 68 34.17%
Espresso machine 63 31.66%
Filter machine 40 20.10%
French press 7 3.52%
Instant coffee 4 2.01%
Moka pot 13 6.53%
V60 3 1.51%

Brand choose

Brand choice Absolute Relative
Af en toe 69 34.33%
Don’t check the brand 1 0.50%
Every time 2 1.00%
Heel vaak 7 3.48%
Never 15 7.46%
Nooit 53 26.37%
Sometimes 43 21.39%
Very often 11 5.47%

Purchase Method

Purchase Method Absolute Relative
Amazon 1 0.50%
De supermarkt 16 7.96%
Direct with the coffee maker 1 0.50%
E-commerce 26 12.94%
Fairtrade Store 1 0.50%
Online subscription 9 4.48%
Online website / e-commerce 7 3.48%
Specialty stores or cafés 22 10.95%
The supermarket 118 58.71%

Purchase decisions 1-5

Price
Purchase decision Price Absolute Relative
1 23 11.44%
2 51 25.37%
3 47 23.38%
4 41 20.40%
5 39 19.40%

Loading required package: viridisLite


Univariate descriptions - Numerical variables

Amount coffe consumed weekly

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   12.00   17.00   19.38   25.00   70.00 

Amount per month out of house

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   2.000   5.000   7.811  10.000  40.000 

Money coffee

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   10.00   20.00   25.77   35.00  120.00 

Money groceries

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0   150.0   200.0   249.5   300.0   900.0 

Subscription likely

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   3.000   3.771   6.000  10.000 

App likely

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   3.000   4.244   7.000  10.000 


Boxplots

Parametric testing

H_0 <- There is no association between the two variables.
H_a <- There is a association.

Age - Amount coffee drank


    Pearson's Chi-squared test

data:  AmountWeek and AgeCategory
X-squared = 230.83, df = 132, p-value = 0.0000002273

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  AmountWeek and AgeCategory
X-squared = 230.83, df = NA, p-value = 0.03194

Education - Amount coffee drank


    Pearson's Chi-squared test

data:  AmountWeek and Education
X-squared = 224.72, df = 165, p-value = 0.001378

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  AmountWeek and Education
X-squared = 224.72, df = NA, p-value = 0.07784

Gender - Amount coffee drank


    Pearson's Chi-squared test

data:  AmountWeek and Gender
X-squared = 69.007, df = 66, p-value = 0.3761

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  AmountWeek and Gender
X-squared = 69.007, df = NA, p-value = 0.3433

Home - Amount coffee drank


    Pearson's Chi-squared test

data:  AmountWeek and Home
X-squared = 60.127, df = 66, p-value = 0.6804

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  AmountWeek and Home
X-squared = 60.127, df = NA, p-value = 0.6567

App - Age


    Pearson's Chi-squared test

data:  App_Likely and AgeCategory
X-squared = 52.761, df = 36, p-value = 0.0353

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  App_Likely and AgeCategory
X-squared = 52.761, df = NA, p-value = 0.02994

Coffee knowledge - Age


    Pearson's Chi-squared test

data:  KnowledgeCoffee and AgeCategory
X-squared = 104.25, df = 36, p-value = 0.00000001471

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  KnowledgeCoffee and AgeCategory
X-squared = 104.25, df = NA, p-value = 0.003992

Coffee knowledge - Purchase location


    Pearson's Chi-squared test

data:  KnowledgeCoffee and PurchaseLocation
X-squared = 50.617, df = 72, p-value = 0.9738

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  KnowledgeCoffee and PurchaseLocation
X-squared = 50.617, df = NA, p-value = 0.9281

Relationships


Regressions

Incl categorical variables as dummies

Cooks distance –> outliers


Data problems